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Abstract: In the pattern mining FP-growth method is an 
efficient algorithm to mine long or short frequent patterns. 
By using compact tree structure and partitioning-based 
divide-and-conquer searching method is used to reduces the 
searching costs substantially. But in the analysis process of 
FP-tree construction , it is a strict serial computing process. 
The proposed algorithm performance is related to the 
database size, the sum of frequent patterns in the database. 
By using distributed parallel and partition computation 
technique is used to solve this problem. This method 
apparently increase the costs for exchanging and combining 
control information at the first time of execution but in the 
next time for item searching the performance improvement 
is increases and response time also improve. 
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1. INTRODUCTION 

Web mining is the application of machine learning (data 
mining) techniques. It is web-based data for the purpose of 
learning or extracting knowledge. It encompasses a wide 
variety technique, including soft computing. This 
methodologies can generally be classified into one of three 
distinct categories: web usage mining, web structure mining, 
and web content mining examine web page usage patterns in 
order to learn about a web system's users or the relationships 
between the documents. In web usage mining the goal is to 
examine web page usage patterns in order to learn about a 
web system's users or the relationships between the 
documents. 

Web usage mining is defined as the process of applying data 
mining techniques to the discovery of usage patterns from 
web logs data, to identify web users’ behavior. In Web 
mining, data can be collected at the server-side, client-side 
and proxy servers. 

1.1 Clustering Technique 

Clustering and classification have been useful and active 
areas of machine learning research that promise to help us 
cope with the problem of information overload on the 
Internet. With clustering the goal is to separate a given group 
of data items (the data set) into groups called clusters such 
that items in the same cluster are similar to each other and 
dissimilar to the items in other clusters. 



1.2 K-Mean Algorithm 

The K-Means algorithm is one of a group of algorithms called 
partitioning clustering algorithm. The most commonly use 
partition clustering strategy is based on square error criterion. 
The general objective is to obtain the partition that, for a fixed 
number of clusters, minimizes the total square errors. 
Suppose that the given set of N samples in an n-dimensional 
space has somehow been partitioned into K-clusters { Cj, C 2 , 
C 3 ... C^}. Each C K has n K samples and each sample is in 
exactly one cluster, so that X n K = N, where k=l... K. The 
mean vector M k of cluster C K is defined as the centroid of the 
cluster 

n k 

M K =(l/n k )2X 

'=' ( 1 ) 

Where x ik is the i th sample belonging to cluster C K The 
square-error for cluster C^is the sum of the squared Euclidean 
distances between each sample in and its centroid. This 
error is also called the within -cluster variation [5]: 

n k 

,= i (2) 

The square-error for the entire clustering space containing K 
cluster is the sum of the within-cluster variations 

*=i (3) 

The basic steps of the K-mean algorithm are: 

• Select an initial partition with K clusters containing 
randomly chosen sample, and compute the centroids 
of the clusters, 

• Generate a new partition by assigning each sample to 
the closest cluster centre, 

• Compute new cluster centre as the centroids of the 
clusters, 

• Repeat steps 2 and 3 until optimum value of the 
criterion function is found or until the cluster 
membership stabilizes. 

II. RELATED WORK 

The database projection growth based approach, FreeSpan, 
was developed. Although FreeSpan outperforms the Apriori 
based GSP algorithm, FreeSpan may generate any substring 
combination in a sequence. The projection in FreeSpan must 
keep all sequences in the original sequence database without 
length reduction. 
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The Partitioned Candidate Common Database (PCCD) 
algorithm, by Zaki et al. is essentially a data distribution 
algorithm supports task parallelism over shared memory 
distributed machines, and is a Data Distribution version of 
CCPD algorithm. 



PrefixSpan, a more efficient pattern growth algorithm was 
proposed which improves the mining process. The main idea 
of PrefixSpan is to examine only the prefix subsequences and 
project only their corresponding suffix subsequences into 
projected databases. 

In WUM we find the behavior of user either it is registered or 
not. If a website requires users to sign in before they can start 
browsing, it will be very easy not only to differentiate 
between users but also to identify each single user. The 
problem arises when a website allows visitors to 
anonymously browse its content, which is common place. In 
this paper differentiate between visitors activity as a 
challenging task in the log using common log format. 

III. PROPOSED WORK 

The Data mining tasks is used to discover different kinds of 
patterns. There are two types of data mining tasks: descriptive 
and predictive. Descriptive data mining tasks is used to 
describe the general properties of the existing data and 
predictive data mining tasks do predictions based on inference 
on available data. The Taxonomy of Data Mining tasks are 
shown by Figure 1 . 




Figure 3.1: Data Mining Taxonomy 



A partition of the database refers to any subset of the 
transactions contained in the database D. Any two different 
partitions are non-overlapping, i.e. any item set that is 
potentially frequent in DB must be frequent in at least one of 
the partitions of DB. Partition scans DB only twice: 

Scan- 1 : Partition database and find local frequent patterns. 
Scan-2: Consolidate global frequent patterns. 

Initially the database D is logically partitioned into n 
partitions. 



Phase-I: Read the entire database once, takes n iterations 
input: pi, where i = 7... n. 
output: local large item sets of all lengths, 
as the output. 



Merge Phase: 

input: local large item sets of same lengths from all n 
partitions 

output: combine and generate the global candidate 
item sets. The set of global candidate item sets of length j is 
computed as 

Phase-II: read the entire database again, takes n iterations 
input: pi, where i = 7... n; 

output: counters for each global candidate item set 
and counts their support 

Algorithm output: item sets that have the minimum global 
support along with their support. The algorithm reads the 
entire database twice. 

IV. EXPERIMENTAL RESULT 

All the experiments are performed on a 3 -GHz Pentium PC 
machine with 1 GB megabytes main memory, running on 
Microsoft Windows/NT. All the programs are written in 
Microsoft Visual Studio 2010. The experiments are pursued 
on both synthetic and real data sets. This paper showing the 
result used frequent sequential pattern for behavior of similar 
pattern access of user. 
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Figure 4.1: Execution of FP-growth with Support and 

Confidence 



The figure 4. 1 is showing the output after the execution of the 
complete algorithm. This snapshot display menu in which 
user interact easily. A message display for his choice where 
user select first option after it again a message is display enter 
the number of transaction to be added after that he give the id 
of the item. 
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Figure 4.2: Rules Generation of Items with Menu 

Interaction 

The figure 4.2 is showing the output after the execution of the 
complete algorithm. This snapshot display menu in which 
user interact easily and a message display for his choice. User 
select second option all the transactions with items and 
transaction id is displayed. 




Figure 4.3: Conditional Pattern Tree Projection with Rule 

Generation of Transaction 

The figure 4.3 is showing the output after the execution of the 
complete algorithm. This snapshot display menu in which 
user interact easily and a message display for his choice. User 
select “View Transaction” option the output show the 
conditional Pattern of the FP -growth tree. 

4.1 TREE WITH DATABASE PARALLEL 
PROJECTION 




Figure 4.4: Parallel Projection with Transaction-Database 

using Support of Items 

The figure 4.4 is showing the output after the execution of the 
complete algorithm. This snapshot display menu in which 
user interacts easily and a message display for his choice 
when user select a option according to the select option the 
operation is performed. 

4.2 FP-GROWTH TREE WITH DATABASE 
PARTITION PROJECTION 

The figure 4.5 is showing the output after the execution of the 
complete algorithm. This snapshot display menu in which 
user interacts easily and a message display for his choice 
when user select an option according to the select option the 
operation is performed. 




Figure 4.5: Partition Projection with Transaction- 
Database using Support of Items 



4.3 COMPARISON ON THE BASIS OF EXECUTION 
TIME AND MINIMUM SUPPORT COUNT AMONG 
CONDITIONAL PATTERN, PARALLEL 
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PROJECTION AND DATA BASE PARTITION 
PROJECTION OF FP-GROWTH TREE 



Number 

of 

Records 


Time taken to 
execute 
(In ms) 

FP- Growth 
Tree with 
Conditional 
Pattern 


Time taken to 
execute 
(In ms) 

FP-Growth Tree 
with Database 
Parallel 
Projection 


Time taken to 
execute 
(In ms) 

FP-Growth Tree 
with Database 
Partition 
Projection 


2 


102 


133 


152 


3 


85 


123 


132 


4 


60 


85 


96 


5 


45 


73 


79 



Table 4.1: Comparison among Conditional Pattern, 
Database Parallel Projection and Database Partition 
Projection of FP-Growth Trees with different support 



The Table 4.1 shows the comparison among Conditional 
Pattern, Database Parallel Projection and Database Partition 
Projection by using different support and we found that 
execution time for conditional pattern projection is less as 
compare to parallel and partition projection of FP-Growth 
Tree. 

Figure 4.6 shows the comparison among conditional pattern, 
database parallel projection and database partition projection 
of FP-Growth Tree with different support. 




Figure 4.6: Comparison among Conditional Pattern, 
Parallel Projection and Partition Projection of FP-Tree 

with different support 



4.4 COMPARIOSION ON THE BASIS NUMBER 
OF RECORDS AND MINIMUM SUPPORT COUNT 
BETWEEN FP-GROWTH TREE CONDITIONAL 
PATTERN, FP-GROWTH TREE WITH DATABASE 
PARALLEL PROJECTION AND FP-GROWTH TREE 
WITH DATABASE PARTITION PROJECTION 
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Time taken to 
execute 
(In ms) 

FP-GROWTH 
Tree with 
Database Parallel 
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200 


67 


85 


92 



300 


70 


97 


103 


400 


130 


145 


157 


500 


187 


197 


221 



Table 4.2: Comparison among Conditional Pattern, 
Database Parallel Projection and Database Partition 
Projection of FP-Growth Trees with number of records 



The Table 4.2 shows the comparison among Conditional 
Pattern, Database Parallel Projection and Database Partition 
Projection by using number of records and we found that 
execution time for conditional pattern projection is less as 
compare to parallel and partition projection of FP-Growth 
Tree. 
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Figure 4.7: Comparison among Conditional Pattern, 
Database Parallel Projection and Database Partition 
Projection with different number of records 

Figure 4.7 shows the comparison among conditional pattern, 
database parallel projection and database partition projection 
of FP-Growth Tree using different number of records. We 
found that execution time of condition pattern is very low. 

V. CONCLUSION 

FP-tree achieves good compactness most of the time. 
Especially in dense datasets, it can compress the database 
many times. Clearly, there is some overhead for pointers and 
counters. However, the gain of sharing among frequent 
projections of transactions is substantially more than the 
overhead and thus makes FP-tree space more efficient in 
many cases. When support is very low, FP-tree becomes 
bushy. In such cases, the degree of sharing in branches of FP- 
tree becomes low. The overhead of links makes the size of 
FP-tree large. Therefore, instead of building FP-tree, we 
should construct projected databases. That is the reason why 
we build FP-tree for transaction database/pr ejected database 
only when it passes certain density threshold. From the 
experiments, one can see that such a threshold is pretty low, 
and easy to touch. Therefore, even for very large and/or 
sparse database, after one or a few rounds of database 
projection, FP-tree can be used for all the remaining mining 
tasks. In the following experiments, we employed an 
implementation. Finally FP-Growth Tree is more efficient 
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then tree projection but it is difficult to maintain it in memory 
so tree projection is used in tree projection two types of 
projection is used parallel projection is good but it takes more 
memory but partition projection takes more time is execution 
but takes less space compare to parallel projection. 
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